Survival of the Fittest: Cox Regression

Capstone 6940

Hazardous Conditions: Kayla Boyd & Hermela Shimelis

Modeling of Survival after Chemotherapy for Colon Cancer

By Hermela Shimelis

Introduction

  • The colon cancer data is a built-in data set in the Survival R package [2]

  • Data set includes 929 subjects with stage B/C colon cancer who were randomized to three treatment groups then followed for 8-years.

    • Observation, Levamisole (Lev), Levamisole + 5-FU
  • The time to death or progression is given in days. The data set is filtered to evaluate time to death

  • Objective: Model the relationship between survival time and treatment

Predictors

Category Variables
Treatment Observation (no treatment)
Amisole (Lev)
Amisole + 5-FU
Patient Characteristics Age
Sex
Tumor Characteristics Colon perforation and obstruction
Adherence to nearby organs
Tumor differentiation
Local spread
Differentiation of tumor (well, moderate, poor)
More than 4 positive lymph nodes
Other Time from surgery to registration

Patient characteristics are similar between the three groups

Observation (%) Amisole (%) Amisole + 5-FU (%)
N=315 N=310 N=304
Demographics
Male 166 (52.3) 177 (57.1) 141
Median age (years) [IQR] 60 [53,68] 61 [53,69] 61 [52,70]
Cancer characteristics
Colon obstruction 63 (20.0) 63 (20.3) 54 (17.8)
Colon perforation 9 (2.9) 10 (3.2) 8 (2.6)
Adherence to nearby organs 47 (14.9) 49 (15.8) 39 (12.8)
Differentiation of tumor
Well 27 (8.6) 37 (11.9) 29 (9.5)
Moderate 236 (74.9) 229 (73.9) 221 (72.7)
Poor 52 (16.5) 44 (14.2) 54 (17.8)
Extent of local spread
Contiguous 20 (6.3) 12 (3.9) 11 (3.6)
Muscle 38 (12.1) 36 (11.6) 32 (10.5)
Serosa 249 (79.0) 259 (83.5) 251 (82.6)
Submucosa 8 (2.5) 3 (1.0) 10 (3.3)
More than 4 lymph nodes with cancer Yes 87 (27.6) 89 (28.7) 79 (26.0)
Short time from surgery to registration (%) Yes 91 (28.9) 80 (25.8) 76 (25.0)

Kaplan-Meier Curve Stratified by Treatment Groups

fit <- survfit(Surv(time,status) ~ rx, data = colon_surv)
ggsurvplot(fit, data=colon_surv, risk.table = TRUE)

Cox Regression Models

m0 <- coxph(Surv(time, status) ~ 1, data = df)
summary_m0 = summary(m0)
c_index_m0 <- concordance(m0)


cat("Concordance of the base model:",c_index_m0$concordance)
Concordance of the base model: 0.5
Model_2 <- coxph(Surv(time, status) ~ rx, data = df)


Characteristic HR1 95% CI1 p-value
rx


    Obs
    Lev 0.97 0.78, 1.21 0.8
    Lev+5FU 0.69 0.55, 0.87 0.002
Concordance = 0.536
1 HR = Hazard Ratio, CI = Confidence Interval
Model_3 <- coxph(Surv(time, status) ~ rx+ age + sex + perfor + adhere + surg + obstruct + differentiation + node4+ local_spread, data = df)


Characteristic HR1 95% CI1 p-value
rx


    Obs
    Lev 0.98 0.79, 1.22 0.9
    Lev+5FU 0.69 0.54, 0.87 0.002
age 1.01 1.00, 1.02 0.083
sex 1.04 0.86, 1.26 0.7
perfor 1.00 0.59, 1.70 >0.9
adhere 1.18 0.92, 1.53 0.2
surg 1.27 1.03, 1.55 0.022
obstruct 1.33 1.06, 1.68 0.015
differentiation


    moderate
    poor 1.43 1.13, 1.82 0.003
    well 1.08 0.78, 1.50 0.6
node4 2.55 2.10, 3.09 <0.001
local_spread


    contiguous
    muscle 0.39 0.23, 0.64 <0.001
    serosa 0.64 0.43, 0.94 0.023
    submucosa 0.29 0.10, 0.83 0.021
Concordance = 0.674
1 HR = Hazard Ratio, CI = Confidence Interval
Model_4 <- coxph(Surv(time, status) ~ rx + age + surg + obstruct + 
    differentiation + node4 + local_spread, data = df)
Characteristic HR1 95% CI1 p-value
rx


    Obs
    Lev 0.99 0.80, 1.23 >0.9
    Lev+5FU 0.69 0.54, 0.87 0.002
age 1.01 1.00, 1.02 0.069
surg 1.28 1.04, 1.56 0.018
obstruct 1.33 1.06, 1.67 0.015
differentiation


    moderate
    poor 1.45 1.15, 1.84 0.002
    well 1.07 0.77, 1.48 0.7
node4 2.53 2.09, 3.07 <0.001
local_spread


    contiguous
    muscle 0.37 0.23, 0.61 <0.001
    serosa 0.61 0.41, 0.89 0.010
    submucosa 0.27 0.09, 0.76 0.014
Concordance = 0.672
1 HR = Hazard Ratio, CI = Confidence Interval
Schoenfeld Residuals Test Results
chisq df p Variable
rx 2.335 2 0.311 rx
age 0.549 1 0.459 age
surg 0.020 1 0.888 surg
obstruct 6.148 1 0.013 obstruct
differentiation 17.459 2 0.000 differentiation
node4 5.662 1 0.017 node4
local_spread 7.083 3 0.069 local_spread
GLOBAL 37.525 11 0.000 GLOBAL
Model_5 <- coxph(Surv(time, status) ~ rx + age + surg + strata(obstruct) + strata(differentiation) + node4 +
              local_spread, data = df)
Characteristic HR1 95% CI1 p-value
rx


    Obs
    Lev 0.98 0.79, 1.22 0.9
    Lev+5FU 0.71 0.56, 0.89 0.003
age 1.01 1.00, 1.02 0.034
surg 1.30 1.06, 1.59 0.012
node4 2.50 2.06, 3.04 <0.001
local_spread


    contiguous
    muscle 0.34 0.21, 0.56 <0.001
    serosa 0.58 0.39, 0.84 0.004
    submucosa 0.24 0.08, 0.67 0.007
Concordance = 0.674
1 HR = Hazard Ratio, CI = Confidence Interval

Stratified Model Meets Proportional Hazards Assumption

Schoenfeld Residuals Test Results
chisq df p Variable
rx 2.001 2 0.368 rx
age 0.670 1 0.413 age
surg 0.014 1 0.905 surg
node4 4.288 1 0.038 node4
local_spread 5.298 3 0.151 local_spread
GLOBAL 12.411 8 0.134 GLOBAL

Model Evaluation Metrics

Model Description AIC BIC C_Index
Model 1 Base model 5860.383 5860.383 0.500
Model 2 Treatment 5852.236 5860.463 0.536
Model 3 Full variables 5741.401 5798.993 0.674
Model 4 Stepwise-selected variables 5737.261 5782.511 0.672
Model 5 Stratified 4567.829 4600.739 0.674

K-fold Cross Validation

Original c-index: 0.65 
Mean cross-validated c-Index: 0.64 

Conclusions

  • Treatment with Levamisole + 5-FU decreases the hazard of death from colon cancer by 29.5% (HR =0.71, 95% CI: 0.56-0.89; p=0.0035).

  • Having more than 4 tumor positive lymph nodes significantly increases the hazard of death by 150.3% (p <0.0001).

  • Having a long wait period from surgery to registration for trial is associated with an increase in the hazard by 99.5% (p=0.01).

  • Patients with local tumor spread to the submucosa, muscle and serosa have a reduction in the hazard by 76% (p=.0007), 66% (<0.001), and 43% (0.004), respectively, compared to those with contiguous organ spread.

  • The concordance of the model (0.67) indicates moderate predictive accuracy for survival time.

  • Other variables that were not included in the study may contribute to survival time.

Supplementary data

Predictors

id: id
study: 1 for all patients
rx: Treatment - Obs(ervation), Lev(amisole), Lev(amisole)+5-FU
sex: 1=male
age: in years
obstruct: obstruction of colon by tumour
perfor: perforation of colon
adhere: adherence to nearby organs
nodes: number of lymph nodes with detectable cancer
time: days until event or censoring
status: censoring status
differ: differentiation of tumour (1=well, 2=moderate, 3=poor)
extent: Extent of local spread (1=submucosa, 2=muscle, 3=serosa, 4=contiguous structures)
surg: time from surgery to registration (0=short, 1=long)
node4: more than 4 positive lymph nodes
etype: event type: 1=recurrence,2=death

References

[1]
Terry M. Therneau and Patricia M. Grambsch, Modeling survival data: Extending the Cox model. New York: Springer, 2000.
[2]
T. M. Therneau, A package for survival analysis in r. 2024. Available: https://CRAN.R-project.org/package=survival